Saliva-based SARS-CoV-2 serology using at-home collection kits returned via mail

Serology provides tools for epidemiologic studies, and may have a role in vaccine prioritization and selection. Automated serologic testing of saliva, especially specimens that are self-collected at home and sent to a laboratory via the mail without refrigeration, could be a highly-scalable strategy for population-wide testing. In this prospective study, non-vaccinated patients were recruited after PCR testing to self-collect saliva and return their specimens via mail. Longitudinal specimens were analyzed in order to monitor seroconversion in the weeks after a diagnostic PCR test for SARS-CoV-2. Diverse users self-collected saliva and returned specimens via mail in compliance with shipping regulations. At our pre-established threshold (0.963 AU/mL), salivary IgG reactivity to full-length spike protein achieved 95.8% sensitivity and 92.4% specificity at 2–4 weeks after diagnostic testing, which is comparable to the typical sensitivity and specificity achieved for serum testing. Reactivity to N antigen also was detected with 92.6% sensitivity and 90.7% specificity at 4–8 weeks after diagnostic testing. Moreover, serologic testing for endemic coronaviruses performed in multiplex with SARS-CoV-2 antigens has the potential to identify samples that may require retesting due to effects of pre-analytical factors. The easy-to-use saliva collection kit, coupled with thresholds for positivity and methods of flagging samples for retest, provides a framework for large-scale serosurveillance of SARS-CoV-2.

who tested positive for SARS-CoV-2 infection by RT-PCR and 81 participants who tested negative (67%). As shown in Table 1, the participants were diverse in age, gender, ethnicity, and socioeconomic status (SES). The samples were collected from non-vaccinated individuals from July 2020 to March 2021 within the Washington, DC metropolitan area, prior to the broad availability of SARS-CoV-2 vaccines. Participants who tested positive had similar demographics to participants who tested negative. The groups were well-balanced by gender and SES, as indicated by the area disadvantage index (ADI) 36 . Participants whose PCR test did not detect SARS-CoV-2 (PCR-participants) tended to be slightly older and more likely to be insured through Medicare than participants whose PCR test detected SARS-CoV-2 (PCR + participants). This trend may result from a lower positivity rate among older asymptomatic patients undergoing pre-surgical clearance for outpatient surgery than among patients tested due to symptoms of COVID-19. Additionally, PCR + participants were less likely to identify as non-Hispanic White, reflecting the rates of COVID-19 infection among different ethnic groups 37 .
Participants were instructed to return specimens immediately upon receiving an enrollment kit and then again at 10 and 30 days after their PCR test. Among participants who provided a specimen, the majority provided three specimens as instructed; however, the timing of collection and mailing in some cases varied considerably from the specified times. Samples were mailed by participants as early as 1 day and as late as 102 days after their RT-PCR test. The mode and median of days between PCR test and mailing the first sample were 9 and 11 days, respectively. To create groups with roughly equal numbers of samples, samples were divided into three time categories depending on whether they were mailed by participants < 2 weeks, 2-4 weeks, or 4-8 weeks after the PCR test ( Table 2).
The transit time of specimens from participants to the laboratory ranged from 1 to 31 days (1st quartile = 1.4 days, median = 1.9 days, 3rd quartile = 3.5 days) in the mail with 91% arriving in less than 5 days, which was our target window based on prior testing showing stability for 5 days 35 . Two specimens that were in the mail over 20 days were excluded from analysis of test performance, but were included in an analysis investigating potential indicators of sample degradation.

IgG reactivity to SARS-CoV-2 antigens.
Concentrations of salivary IgG reactive to coronavirus antigens are shown in Fig. 1. Antibody positivity for SARS-CoV-2 antigens was determined based on pre-established thresholds set at the 98th percentile for saliva self-collected from presumed naive participants (no PCR confirmed diagnosis, no household exposure, and no symptoms of COVID-19) in a previous study 35 . Pre-established IgG thresholds for Spike, RBD, and N were 0.963, 0.244, and 3.18 AU/mL, respectively.
Clinical performance of the serology assays was determined relative to the COVID-19 PCR test result at enrollment. Sensitivity and specificity were calculated, respectively, as (i) the proportion of saliva specimens from PCR-confirmed cases with antibody levels above the pre-established thresholds and (ii) the proportion of saliva specimens from PCR-negative cases with antibody levels at or below the pre-established thresholds. Measured sensitivity and specificity values are provided in Table 3. The SARS-CoV-2 Spike IgG assay provided the best overall accuracy. The sensitivity was only 40.7% within two weeks of PCR testing, but increased to 96.0% at 2-4 weeks and 92.6% at 4-8 weeks after PCR testing. The specificity was 92.4%. By comparison, when the same assay was evaluated with serum samples in an independent study, the sensitivity and specificity were reported as 90.8% and 97.4%, respectively 38 . The SARS-CoV-2 N IgG assay performed similarly to the SARS-CoV-2 Spike assay with point estimates for sensitivity and specificity that were not statistically different. The SARS-CoV-2 www.nature.com/scientificreports/ RBD IgG assay exhibited similar sensitivity; however, the specificity was significantly poorer (Table 3), which may indicate that the pre-set assay threshold was not optimal (see discussion of threshold verification below). IgG reactivity to the SARS-CoV-2 spike protein was highly correlated with IgG reactivity to the N protein and RBD domain of spike protein, especially for samples from PCR-positive cases (Fig. 2). The correlation of the reactivities to RBD and Spike (Fig. 2a) shows that the concentrations of anti-RBD IgG antibodies tended to be about threefold lower than for full-length spike. As RBD is a fragment of the Spike protein, the difference in Table 1. Demographics of study participants. p-values were computed using the chi-square test. Fisher's exact tests were also run in consideration of small cell sizes. In all cases, the p-values for the chi-square or Fisher's tests were > 0.05. www.nature.com/scientificreports/ antibody activity is likely due to the reduced number of antigenic epitopes displayed for RBD relative to Spike. For PCR-negative individuals, measured reactivities of IgG to SARS-CoV-2 N tended to span a larger range than IgG to SARS-CoV-2 Spike, which was heavily skewed to the bottom of the assay range. This result may be a consequence of cross-reactive host antibodies from previous infections with other circulating coronaviruses, since the N protein has greater conservation across human coronaviruses than the Spike protein. However, the effect of any cross-reactivity on assay performance was small, with the N assay showing only a small and nonstatistically significant decrease in specificity relative to the Spike assay.

Percentage of all participants
We looked for evidence that elevated antibody levels in PCR-negative participants may have been due to undiagnosed infections. For the 67 PCR-negative participants who provided at least two samples, 6 of these 67 participants had at least one sample above the assay threshold for the SARS-CoV-2 Spike IgG assay. Of these 6 participants with at least one positive sample, 2 had salivary IgG levels above the threshold for SARS-CoV-2  www.nature.com/scientificreports/ Spike for all three of their samples. These participants also had salivary IgG levels above the threshold for SARS-CoV-2 N protein, which suggests an undiagnosed infection prior to enrollment. Of the 6 participants with at least one positive sample, 2 participants had an initial negative sample and then showed delayed sero-conversion after 30 days. The sero-conversion was also observed using the SARS-CoV-2 N IgG assay, which suggests that these participants may have been infected at enrollment and received a false negative PCR test 39 , or they may have become infected after enrollment.
Verification of pre-established thresholds. Receiver operator characteristic (ROC) curves were generated ( Fig. 3), and the area under curve (AUC) values for the ROC curves were calculated (Supplemental Table 1) to compare the diagnostic performance of the serology assays at different times after nasal PCR testing and to confirm that the pre-determined thresholds were optimal for identifying infections. For all three SARS-CoV-2 antigens, the area under the curve (AUC) was significantly greater for samples collected more than two weeks after PCR testing relative to samples collected within 2 weeks of testing, largely reflecting the higher sensitivity that was observed for the later samples (Table 3). ROC curves for samples and the associated AUC values were not significantly different for samples collected 2-4 weeks and > 4 weeks after PCR testing, indicating that the assay achieved optimal diagnostic performance by the 2 week time point. The ROC curves for the SARS-CoV-2 Spike and N IgG assays were similar with AUC values of 0.926 and 0.916, respectively, for samples collected 4-8 weeks after PCR testing. The SARS-CoV-2 RBD assay provided poorer classification with an AUC value of 0.883 for the same samples.  www.nature.com/scientificreports/ To assess the validity of our pre-established thresholds, we computed the optimal thresholds using data only from this study by identifying thresholds that maximize the sum of sensitivity and specificity. The pre-established thresholds and the optimal thresholds for this study are compared graphically in Fig. 3. For the spike and N IgG assays, the pre-established thresholds were close to optimal and no significant improvement in sensitivity or specificity could be achieved by adjusting the threshold. In contrast, the pre-determined threshold for the RBD IgG assay was lower than optimal and increasing the threshold from 0.244 AU/mL to 0.684 AU/mL greatly improved the specificity for samples collected 4 weeks or later after PCR testing from 64 to 87%, while causing a much smaller loss in sensitivity from 93 to 89%.
Exploration of retest criteria. Although saliva collection is simple and intuitive, the potential for poor specimen quality should be addressed when specimens are self-collected without supervision and transported under uncontrolled conditions. We explored options for identifying specimens of high risk for providing inaccurate results including (i) the measurement of salivary antibodies that are expected to be universally abundant due to vaccinations or common natural infections, (ii) the measurement of total salivary immunoglobulin levels, and (iii) the measurement of background assay signals in the absence of an antigen target.
Prior infection with endemic coronaviruses is common 40,41 , so we expected that all donors would have high levels of antibodies to at least one of the four pre-COVID-19 endemic coronaviruses 42 . The multiplexed antigen panel used to measure antibodies to SARS-CoV-2 antigens also measured antibodies against the spike antigens for the four pre-COVID-19 endemic coronaviruses HKU1, NL63, OC43, and 229E (Fig. 4a). Nearly all specimens had readily detectable levels of antibodies to a spike protein of at least one endemic coronavirus. As an aggregate metric of reactivity to endemic coronaviruses, we computed the geometric mean of salivary IgG for HKU1, NL63, OC43, and 229E. We flagged eight outlier samples with geometric means below 0.17 AU/mL, which is the geometric mean of the 5th percentiles of the salivary IgG for these four antigens measured in a prior study 35 . These outliers appear to result from an issue with sample collection or sample deterioration, as opposed to the lack of immunity to an endemic coronavirus due to the absence of previous exposure or from general immunosuppression. In all cases where donors provided at least one other sample, normal levels of antibodies for endemic coronaviruses were measured at another time point. Sample deterioration due to delayed transit time in the mail could explain some of the flagged outliers, but not all of them. Two of the flagged samples were the samples with the longest transit times (> 20 days due to a general slowdown in mail during a period of this study), but the other 6 flagged samples were received within the target range of 5 days. For samples received within the target time range, there was no clear dependence of measured antibody levels with transit time (Fig. 4b). We note that for the 8 flagged samples, 6 were true negatives for SARS-CoV-2 infection by PCR testing, so excluding these samples did not significantly impact the reported sensitivity or specificity.
We also measured the total concentration of salivary IgG, IgM and IgA using a separate assay panel run at a different dilution (Fig. 4c). Median concentrations of total IgG, IgA and IgM were 3.3 µg/mL, ≥ 200 µg/mL (the top of the assay dynamic range at the selected sample dilution) and 3.4 µg/mL, respectively, which are comparable to the values we measured previously 35 (1.8 µg/mL for IgG, 124 µg/mL for IgA and 3.7 µg/mL for IgM). Moreover, the IgG, IgA and IgM concentrations align with published ranges measured using a different assay and collection method (IgG range = 0.4-93 µg/mL 43 ; IgA = 50.2 ± 19.1 µg/mL 44 ; IgM = 0.5-13.0 µg/mL 45 ). Low observed levels for the endemic coronaviruses were generally associated with low levels of total immunoglobulin. Of the eight samples that were flagged for low antibody levels against the four endemic coronaviruses, six samples had undetectable total IgG levels at the sample dilution used for the total immunoglobulin measurement, and four samples provided the lowest measured levels of total IgA (Fig. 4c).
Non-specific binding is another potential source of measurement error. Bovine serum albumin (BSA) was included as an antigen in the multiplex as a negative control. Specific binding of anti-BSA antibodies in samples should not occur due to the high concentration of BSA present in the assay diluents, therefore, binding to the non-BSA element in the antigen array should be indicative of antibodies that are able to bind non-specifically to the array surface. Non-specific binding, as assessed by signal for the control spot coated with BSA, was generally low (average of 191 counts). Two specimens from the same PCR negative donor were noted to exceed 5000 counts on the BSA coated spot, whereas a third intermediate sample from the same donor showed low non-specific binding.

Discussion
We measured anti-SARS-CoV-2 antibodies in saliva self-collected at home by donors in the weeks following a nasal RT-PCR test. This is the first report to our knowledge of a "spit and mail" serology test for SARS-CoV-2. Importantly, we found that a diverse group of participants were able to self-collect saliva and send it for testing. Using only written instructions without in-person supervision or training, participants universally packaged their specimens according to UN3373 regulations. Specimens were of sufficient quality for analysis, and most participants provided multiple specimens. Multiplexing of antigens for SARS-CoV-2 alongside endemic coronaviruses identified approximately 3% of samples that appeared aberrant due to low levels of antibodies against the four pre-COVID-19 endemic coronaviruses, and those samples may require retesting due to poor sample quality. This study builds upon prior studies 18,19 that found general acceptance for self-collected specimens by demonstrating feasibility of a "spit and mail" approach using a highly scalable kit.
Among participants whose PCR-tests were positive for SARS-CoV-2, the kinetics of changes in salivary antibodies closely paralleled the kinetics of serum antibodies. Antibodies appeared in saliva as early as the first week after PCR testing, and the concentration of salivary antibodies increased over the course of two months, which has also been reported for serum 46 . Our study indicates that measuring salivary antibodies can be an alternative to measurement using serum or finger-stick blood. Among PCR-positive participants who returned www.nature.com/scientificreports/ at least one sample 2-4 weeks after testing, 96% had detectable anti-SARS-CoV-2 spike IgG, which is notable since participants were recruited among patients with mild to moderate disease not requiring hospitalization. In a large study of individuals who recovered from SARS-CoV-2 infection, 91.1% (1,107 of 1,215) were seropositive based on serum testing using a different assay 47 . Testing among outpatients is more challenging since a systemic humoral response may be delayed or absent in some mild cases 48 , whereas seroconversion occurs in nearly 100% of hospitalized patients 46,49 . The sensitivity for detecting asymptomatic infection may be lower since individuals with mild cases tend to have less robust immune responses compared to individuals with severe disease [50][51][52][53] . Samples were classified as antibody positive or negative based on thresholds determined in a previous study of self-collected saliva samples. Using these thresholds, we found that our assays for IgG against SARS-CoV-2 Spike and N antigens provided good sensitivity and specificity for classifying subjects based on COVID PCR test results. The ability to simultaneously measure IgG to both N and Spike proteins may potentially distinguish individuals who have been infected with SARS-CoV-2 (reactivity to both N antigen and Spike) from those that have been vaccinated without a prior infection (reactivity to Spike only). Poorer performance was observed for IgG to SARS-CoV-2 RBD. ROC analysis showed that the pre-determined threshold was not optimal. It is not clear why the optimal thresholds for the RBD IgG assay would be different for the earlier and current studies and why the difference would only affect one of the assays. The difference could be associated with differences in how www.nature.com/scientificreports/ self-collected samples were submitted for testing (collection in drop boxes vs. shipping by mail) or differences in the testing population (healthy individuals vs. individuals who presented for COVID testing and may have symptoms of a respiratory infection). A limitation of this study is that we did not examine matched serum or finger-stick blood samples. Correlation of salivary antibody concentrations with finger-stick blood antibodies would help to determine the reason for the few cases when antibody levels were not as expected based on PCR testing. In particular, some participants may have previously been infected with SARS-CoV-2 prior to the nasal swab PCR test. Another limitation is that we did not measure reactivity of salivary IgA to SARS-CoV-2 antigens, which could potentially provide information that is different and complementary to IgG reactivity when assessing immune status.
Due to a lack of validated criteria to exclude samples based on atypical antibody testing, we excluded only two samples that were affected by prolonged postal delays. Measuring antibodies against endemic coronaviruses was identified as a potential approach, which can be easily multiplexed with the SARS-CoV-2 serology measurements, to flag specimens with suspected technical issues. Establishing retest criteria is recommended before the assay can be used for clinical testing. A geometric mean of antibody levels to four endemic coronaviruses is a promising metric for flagging suspect samples, especially because multiplex assays can obtain this metric without additional sample volume or processing. Additional studies with matched serum samples, or saliva samples collected under ideal conditions, are needed to refine and validate methods for identifying samples that may be affected by collection or handling issues.
Overall, we show the feasibility of a "spit and mail" test for large-scale serologic testing. The easy-to-use kit could be used by a diverse group of participants. The assays require minimal sample handling and can be performed rapidly using an automated analyzer capable of high-throughput testing. Potential uses include epidemiologic studies that require identifying people who have previously been infected or vaccinated. Also, saliva tests may find utility for monitoring durability of immunity.

Methods and materials
Participant recruitment and enrollment. Participants were recruited among adult patients tested for SARS-CoV-2 infection at Kaiser-Permanente clinics in Maryland via RT-PCR testing of nasal swabs. After the diagnostic test was performed, patients were invited to participate. All tested patients were eligible to participate regardless of symptoms, exposure risk, or reason for testing (e.g., pre-surgical evaluation, contact tracing). Participants who agreed to participate were provided with an enrollment box containing instructions, informed consent form, and three self-collection kits. All participants were recruited via a telephone call with the exception of one, who was recruited at a testing clinic and provided an enrollment box. Participants recruited via phone were shipped an enrollment box that arrived within 1-3 days after enrollment. Informed consent was obtained from all participants via a signed barcoded informed consent form and HIPAA authorization, which were mailed back to Kaiser-Permanente researchers. Participants were instructed to self-collect saliva on the day they received the kit and then at 10 and 30 days after their test. Saliva was returned directly to Meso Scale Diagnostics, LLC. (MSD) via the provided pre-paid mailers. MSD tracked participants through bar codes associated with the self-collection kits and had no access to identifiable personal information. Study procedures were approved by the IRB of Kaiser-Permanente. The KPMAS IRB reviewed, approved, and monitored the data collection, management, and analysis protocol in accordance with relevant guidelines/regulations. Collection and transport of saliva. As described previously 35 , saliva was collected into a 2 mL screw-cap centrifuge tube (Sarstedt #72.609 with screw cap 65.716.xxx) using the Saliva Collection Aid (SCA; Salimetrics 5016.02), which is a straw-like device cleared by the FDA for collection of samples from adults and children. Centrifuge tubes were pre-labeled with barcodes that matched barcodes on informed consent forms in order to associate samples with participants in a deidentified manner. The screw cap contained an O-ring in order to be compliant with UN3373 Category B shipping requirements. Donors were instructed to wait 30 min after eating, drinking, or smoking before drooling into the tube according to the manufacturer's directions. Donors were also instructed to provide saliva without assistance from others. The donors capped the tube, and placed it inside a biohazard bag (VWR 11215-684) containing an absorbent pad (ThermoSafe ZORB66). Samples were collected from July 2020 to March 2021 within the Washington, DC metropolitan area, prior to the broad availability of SARS-CoV-2 vaccines.

PCR-confirmed diagnosis of SARS-CoV
To return their samples, donors were instructed to place the biohazard bag containing the tube of saliva into a peel-and-seal cardboard mailer (Stephen Gould; MSD-CFM-SM) bearing a UN3373 label (LabelMaster L380B) and pre-paid shipping label as described previously 35  www.nature.com/scientificreports/ Saliva receipt and storage. Samples were returned anonymously to MSD via USPS First Class Mail. Upon delivery to MSD, saliva was frozen promptly at ≤ − 70 ºC without further processing. The date that saliva was mailed was documented as the date that the sample was picked up as determined by the USPS tracking history. For the initial samples, the accuracy of the USPS tracking record was cross-verified against the date entered by the patient on the informed consent form.
Indirect serology. Indirect serology measurements were conducted using kits and reagents that are commercially available from MSD as described previously 35 . On the day of sample testing, saliva was thawed at room temperature. Saliva was centrifuged briefly to pull down any food particles or mucus. To assess sample quality, we visually verified that samples were saliva and not predominantly phlegm or mucus. Prior to analysis, saliva samples were diluted five-fold by combining 20 µL of sample with 80 µL of a sample diluent (MSD® Diluent 2). Samples were assayed in a 96-well plate format using MSD V-PLEX® COVID-19 Coronavirus Panel 2 kits for measuring IgG (K15369U) antibody responses. Each well of the plates included an antigen array that enabled the multiplexed measurement of antibody responses against nine different coronavirus antigens as well as bovine serum albumin (BSA) as a negative control. These included four SARS-CoV-2 antigens (nucleocapsid protein, spike protein, spike receptor binding domain (RBD) and spike N-terminal domain (NTD)) and spike proteins from five other coronaviruses (SARS-CoV-1 and the four endemic coronaviruses 229E, HKU1, NL63, and OC43). Assay protocols were run according to the manufacturer's recommended protocol for serum except for the use of sample diluents and dilution factors (as described above) that were optimized for saliva. Testing of saliva samples was carried out in an automated fashion using high-throughput automation developed at MSD. Time-to-result was approximately four hours.
For quantitation of antibody responses, an eight-point calibration curve was run in duplicate on all plates and the signals for each antigen were fit to a 1/Y 2 -weighted four parameter logistic (4PL) fit. Samples were run in duplicate and the antibody concentration against each antigen was calculated by back-fitting to the appropriate 4PL fit and correcting for dilution. The concentrations were presented in arbitrary units per mL (AU/mL) that were defined relative to the assigned values of the reference standard. Controls were also run in duplicate on each plate, including three serum-based controls (provided with the kit) and two saliva-based controls (pooled normal saliva sourced from Lee Biosolutions spiked with serum from COVID-19 patients).

Measurement of total salivary antibodies.
Total levels of IgG, IgM, and IgA immunoglobulin were measured using MSD's Isotyping Panel 1 Human/NHP Kit (K15203D) according to the manufacturer's directions as described previously 35 . Saliva was run at a dilution of 1,000-fold. Calibration and quantitation were carried out as described above for the indirect serology measurements. Data analysis. Data processing was performed in Excel. Graphing and statistical analysis were done in R.
Concentrations below an assay's limit of detection (LOD) were assigned concentration values equal to the LOD, which was calculated as the concentration corresponding to 2.5 standard deviations above the assay's background signal. Concentrations exceeding the top calibrator were assigned the concentration of the top calibrator.

Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.